Preventing distribution of modified or corrupted files

ABSTRACT

An administrator node ( 130 ) adjusts a trustworthy-measure associated with nodes ( 110 ) that are suspected of unauthorized modifications of content material. The original provider of the content material to a network binds an identifying code to it. Upon receiving the material from a source node ( 110 ), a target node ( 120 ) computes an associated code for the received material. If the computed code and the identifying code differ, the material is determined to be modified, and a discrepancy report is submitted to the administrator node ( 130 ). The administrator node ( 130 ) effects a penalty against the root source if the modification is confirmed; or against the target node ( 120 ) if the discrepancy report is unfounded. The penalties include downgrading of the trustworthiness-measure associated with each node, and these trustworthiness-measures are available for use by potential target nodes in their selection of preferred source nodes.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. provisional application Ser.No. 60/440,447 filed 16 Jan. 2003, which is incorporated herein byreference.

This invention relates to the field of computer communications, and inparticular to a method and system for controlling the distribution ofmodified or corrupted files via a distributed communications network.

In a distributed communications network, any node in the network may bea source of information content; as such, the integrity of theinformation content is questionable. A first user may, for example,download a song from a second user's system, and a third user may obtaina copy of the song from the first user; a fourth user may obtain a copyfrom the third user, and so on. If the first user's system has a virusthat corrupts the contents of the file containing the song, the third,fourth, and subsequent users may receive a corrupted copy of the song,and may transfer this corrupted copy to yet other users. In like manner,the first user may have intentionally corrupted the song.

In a typical distributed network, a user identifies which files areavailable for distribution to other users. To facilitate thedistribution of such files, an administrator node on the networktypically provides and maintains a catalog of available files, and theirlocation in the network. In a song-distribution network, for example,the catalog will generally contain the title of the song, the name ofthe artist, and the node from which this song can be downloaded. Often,copies of the same song will be available from a variety of nodes.Ideally, because the songs are digitally recorded, each copy of the samesong is identical. However, if one of the copies is corrupted, orbecomes corrupted, it may be distributed to many users before theproblem is discovered, and some of these users may offer theas-yet-undiscovered corrupt file as a catalog entry. Thereafter, theintegrity of any copy of the song from the catalog becomes questionable.

It is an object of this invention to provide a method and system foridentifying modified or corrupted information content. It is anotherobject of this invention to provide a method and system for identifyingthe source of the modification/corruption of the information content. Itis another object of this invention to provide a method and system forresolving conflicts regarding whether the information content has beenmodified/corrupted, and if so, the source of thismodification/corruption.

These objects, and others, are achieved by a method and system thatincludes a detection scheme and a reporting scheme. The originalprovider of content material to a network binds an identifying code tothe material. When the material is received by a target node from asource node, the target node computes an associated code for thisreceived material. If the computed code and the identifying codecorrespond, the material is determined to be as-provided by the originalprovider. If the computed code and the identifying code differ, thematerial is determined to be modified, and a discrepancy report issubmitted to an administrator node. In like manner, if the contentmaterial is determined to be corrupted, or otherwise different thanexpected, a discrepancy report is submitted to the administrator node.The administrator node attempts to determine the root source of themodification or corruption, and effects a penalty against the rootsource if the modification or corruption is confirmed. Optionally, apenalty may be effected against the target node if the discrepancyreport is unfounded. The penalties include downgrading of atrustworthiness-measure associated with each node in the network, andthese trustworthiness-measures are available for use by potential targetnodes in their selection of preferred source nodes.

FIG. 1 illustrates an example block diagram of a modification-monitoringsystem 100 in accordance with this invention.

FIGS. 2A-2B illustrate example flow diagrams of amodification-monitoring process in accordance with this invention.

FIG. 3 illustrates an example flow diagram of a conflict-localizationprocess in accordance with this invention.

FIG. 4 illustrates an example flow diagram of a conflict-resolutionprocess in accordance with this invention.

Throughout the drawings, the same reference numeral refers to the sameelement, or an element that performs substantially the same function.

This invention is based on the observation that the same informationcontent may be available from a variety of sources within a network, orexternal to the network. By distinguishing nodes that are more likely toprovide corrupted information content, other nodes on the network can beconfigured to avoid these nodes when seeking to download new informationcontent, thereby reducing the proliferation of corrupted informationcontent.

FIG. 1 illustrates an example block diagram of a modification-monitoringsystem 100 in accordance with this invention. A target node 120initiates a transfer/download of an information file from a source node110.

In accordance with this invention, each information file has anassociated identifying code that is determined from the content of theinformation file. This identifying code may be, for example, acontrol-sum-code (CSC) that is based on a sum of the bytes within theinformation file, a hash value that is based on a transformation of thebytes within the file, or another parameter whose value is determined bythe contents of the file. Preferably, a one-way code is used, such thatthe value of the code changes in an unpredictable manner when thecontents of the file are modified.

The identifying code is associated with the information file when theinformation file is first introduced to the network. If a node in thenetwork creates the information file, the node also creates theidentifying code when the information file is created and/or madeavailable to other nodes on the network. Alternatively, if a node in thenetwork imports the information file from an external source, and theexternal source does not provide the identifying code, the receivingnode creates the identifying code when the information file is receivedand made available to other nodes on the network. Note that, due to avariety of factors, such as sample rate differences, minor lengthdifferences, and so on, different recordings or different sources of thesame song may have different identifying codes. Conversely, downloadeddigital copies of the same song have identical identifying codes.

When the target node 120 receives the information file and itscorresponding identifying code, the target node 120 independentlycomputes a code for the received information file, using the samealgorithm that was used to create the original identifying code. If thenewly computed code corresponds to the received identifying code, thetarget node 120 concludes that the information file has not beenmodified. If, on the other hand, the newly computed code does notcorrespond to the received identifying code, the target node 120concludes that the information file has been modified, either at thesource node 110, or via the communication channel from the source 110 tothe target 120. The target node 120 repeats the above process todistinguish whether the cause of the modification is the communicationchannel.

In accordance with this invention, when the target node 120 concludesthat the communication channel is not the cause of the discrepancybetween the newly computed code and the original identifying code, thetarget node 120 reports the discrepancy to an administrator node 130 forsubsequent actions. The administrator node 130 determines the validityof the reported discrepancy, as detailed below, and penalizes the sourcenode 110 if the source node 110 is deemed to be the cause of themodification to the information file.

Also in accordance with this invention, if the computed code matches theidentifying code, but the target node 120 subsequently discovers acorruption of the information file, such as a song or video withexcessive distortion, or a song or video that does not correspond to thetitle or author associated with the file, or otherdifferent-than-expected content, the target node 120 reports thediscrepancy to the administrator node 130 for subsequent action, asdetailed below.

Generally, the penalty imposed by the administrator node is adegradation of a trustworthy-measure associated with the source node110. Thereafter, other nodes can access the trustworthy-measureassociated with each of the nodes in the network to determine whichnodes to use as a source for information files. In a preferredembodiment of this invention, the aforementioned catalog of availablefiles includes this trustworthy-measure for each source, or a rating ofeach source based on its trustworthy-measure, such a red (danger),yellow (caution), or green (safe) shading of each source icon. Also in apreferred embodiment, the identifying code from the originating node isalso included in the catalog, to facilitate identification of alteredidentifying codes.

FIGS. 2A-2B illustrate example flow diagrams of amodification-monitoring process in accordance with this invention. FIG.2A corresponds to the above detailed example process of a target node120, and FIG. 2B corresponds to an example process of the administratornode 130. The example process of FIG. 2B illustrates amodification-detection scheme for determining the source of modifiedmaterial, whereas the example processes of FIGS. 3 and 4 illustrate aconflict-resolution scheme for determining the original source ofcorrupt material.

At 210, in FIG. 2A, the target node requests content material from asource node, typically in the form of a computer file. The source nodetransmits the content material and its identifying code, which arereceived by the target, at 220. Alternatively, the identifying code maybe obtained from the catalog, as discussed above. In this and thefollowing examples, a control-sum-code (CSC) is used as the exampleidentifying code. At 230, the target computes a corresponding code CSC′,and compares it to the identifying code CSC that was received from thesource node, or from the catalog, at 232. If these codes CSC, CSC′correspond, the process terminates, at 234. If the codes CSC, CSC′ donot correspond, the above process is repeated, at 236, to verify thatthe difference was not caused by a communication error. When the targetdetermines that the difference was not caused by a communication error,and therefore implies a distortion of the content at the source node,the target node transmits an error report to an administrator node.

At 250, in FIG. 2B, the administrator node receives the error report,which identifies the content file, the source, and the code CSC′computed by the reporting target node. The administrator requests thesame content from the source, at 260, and receives the content from thesource and the original identifying code CSC from either the source orthe catalog, at 270. At 280, the administrator independently computes acorresponding verification code CSC″ based on the received content,using the same algorithm that was used to create the original code CSC.If, at 285, the newly computed verification code CSC″ does notcorrespond to the original code CSC, the administrator node penalizesthe source node, at 290, typically by degrading the trustworthy-measureassociated with the source node. Not shown in FIG. 2B, before penalizingthe source node, the administrator node may repeat the download processto exclude communication errors, or it may compare its computedverification code CSC″ with the computed code CSC′ reported by thetarget node, to verify consistency.

Optionally, at 295, if the newly computed verification code CSC″corresponds to the original identifying code CSC from the source, theadministrator node may penalize the reporting target node for filing afalse report.

As noted above, a target node may also submit an error report when thetarget node subsequently discovers that the content of the file isdifferent-than-expected, hereinafter termed “corrupted” content. Asabove, the error report includes an identification of the source node,an identification of the file, and optionally, the computed identifyingcode. Presumably, this computed code corresponds to the originalidentifying code, because otherwise a modification of the file wouldhave been reported, as detailed above. That is, in accordance with thisinvention, if a node purposely modifies the content of a file, the nodewill be forced to generate a new identifying code that corresponds tothe new/corrupted content, to avoid immediate detection by a target nodeusing the above modification-detection scheme.

Upon receipt of this corruption-error report, the administrator node hastwo tasks: determining the root source of the reportedly-corrupted file,and determining whether the reportedly-corrupted file is, in fact,corrupt. As noted above, a corrupted file may be widely distributedbefore the corruption is identified, and, in a conventional system,identifying the source of corrupt content is extremely difficult. Inaccordance with the principles of this invention, however, theidentifying code facilitates identifying the root source of corruptcontent.

FIG. 3 illustrates an example flow diagram of a conflict-localizationprocess in accordance with this invention. In FIG. 2B, it is assumedthat the administrator merely had to decide whether the target's reportwas accurate. In reality, the source may have provided content that hadbeen modified/corrupted previously, but not previously detected.

In a straight-forward embodiment of this invention, because eachdiffering version of a copy of content material is identified by adifferent identifying code, the administrator node can find the sourceof the corrupted version by analyzing prior versions of the catalog, todetermine the first supplier of this version of the content material, asidentified by the identifying code. Often, however, the administratornode may not be the sole controller of items introduced onto thenetwork, and/or, the administrator may not be configured to retain anexhaustive knowledge of the history of each published catalog, and/or,the administrator may not be configured to produce the catalog at all.

In accordance with a second aspect of the invention, the administratornode is configured to explicitly determine the source of a corruptedfile, based on somewhat incomplete information. In accordance with thisaspect of the invention, the administrator node notifies the source ofthe reported corruption, at 320, and awaits a response, at 325. If thesource fails to respond within a given time interval, the administratorconcludes that the corruption report is true, and penalizes the source.Not illustrated in FIG. 3, if a source admits to having suppliedknown-corrupt content material, the source is penalized, at 330.

If the source responds, the source will either concur or disagree withthe report. Generally, when the source concurs with the report, thesource also disclaims responsibility for the corruption, and identifiesthe prior source from which this source obtained the content material,at 340. In effect, the source provides a belated corruption report,identifying the prior source as the source of the corrupted file. Theadministrator repeats the notification process 320, using this priorsource as the new current source. This back-tracking process 340-320repeats, with each new source identifying its prior source, until thelatest identified source fails to respond, and is penalized, at 330, oruntil the latest identified source disagrees with the reportedcorruption, at 325, and the administrator must resolve the conflict, at350. Not illustrated, the administrator is also configured to provideconflict resolution at 350 when the administrator determines that thebacktracking process 340-320 enters a continuous loop, wherein the trueoriginator of the corruption falsely represents that a recipient of thecorrupted material provided this material.

FIG. 4 illustrates an example flow diagram of a conflict-resolutionprocess in accordance with this invention. At 410, the source may denybeing the provider of the content material. In a preferred embodiment ofthis invention, the administrator has access to prior local regionalcontent catalogs and tables, which identify files offered by each nodeover time, and the corresponding identifying code. At 420, theadministrator checks these catalogs and tables to verify the source'sclaim of non-ownership. If, at 430, the source had owned the subjectcontent material with the corresponding identifying code, then thesource's denial is deemed false, and the source is penalized, at 490;otherwise, the node that reported this source node as the provider ofthe corrupt content material is optionally penalized, at 495.

Alternatively at 410, the source may dispute the assertion that thecontent material is corrupted, at which point the administrator effectsa reliability check, at 440. The reliability check may address thereliability of the content material, or the reliability of the sourcenode, or both. At 450, the administrator assesses the reliability of thecontent material. This can be performed by comparing the contentmaterial to other copies of the same content material, or, if available,to a known trusted copy of the content material. This assessment may beperformed autonomously, if other copies of the content material can belocated and a decision reached, or it may be performed with humanintervention, wherein the administrator presents the evidence to a humanarbitrator who decides whether the evidence is persuasive one way or theother. In the case of a corrupted song or video, for example, thearbitrator is provided the opportunity to hear/view the content.

In an alternative embodiment of this invention, the administrator maypurposely distribute known-good content material to nodes of thenetwork, as reliability-testing content. When the administrator receivesa report of a distorted copy of this reliability-testing contentmaterial, the evidence against the node that first distributes themodified copy is fairly conclusive, justifying a somewhat severepenalty.

If the content material is found not to be distorted, the administratoroptionally penalizes the node that reported the material as distorted,at 495; otherwise, if the content material is found to be distorted, thereported source is penalized, at 490.

At 460, the administrator assesses the reliabilities of the reporter andthe source. Generally, this assessment is performed if the administratoris unable to ascertain whether a modification/corruption has actuallybeen made to the original content material, and/or if the determinationof the true root-source of the material is inconclusive. In a preferredembodiment of this invention, the administrator is configured to presumethat the identified root source has modified the content material.Countering the assumption that the source is at fault, the administratoralso considers other factors, such as the current trustworthy-measuresof the source node and the reporting node, the length of time that eachof the source and reporting nodes have been part of the network, theamount of traffic handled by each of the source and reporting nodes, andso on.

If the source node is determined to be inherently more reliable than thereporting node, the reporting node is optionally penalized, at 495;otherwise, the source node is penalized, at 490. Not illustrated, if theadministrator is unable to conclusively assess the reliability of thecontent material or the source node, no penalty actions are taken forthe current report.

The foregoing merely illustrates the principles of the invention. Itwill thus be appreciated that those skilled in the art will be able todevise various arrangements which, although not explicitly described orshown herein, embody the principles of the invention and are thus withinthe spirit and scope of the following claims.

1. A method of affecting a trustworthy-measure associated with a sourcenode in a distributed network, the method comprising: receiving aninformation file from the source node and a corresponding identifyingcode that is based on content of the information file when theinformation file is introduced to the network, and computing anassociated code based on received content of the information file;comparing the associated code with the identifying code; transmitting anerror report to an administrator node, the error report identifying thesource node and the information file, when at least one of the followingoccur: the associated code does not correspond to the identifying code,and the content of the information file is abnormal; verifying the errorreport by the administrator node; and reducing the value of thetrustworthy-measure associated with the source node in response to theadministrator node verifying the error report, thereby providing thereduced-value trustworthy measure for evaluating subsequent use of thesource node; wherein transmitting an error report includes transmittingan error report in response to the step of comparing indicating that adifference between the associated code and the identifying code is notcaused by a communication error, and further including: repeating thereceiving, computing, and comparing steps prior to transmitting theerror report.
 2. The method of claim 1, wherein the identifying codeincludes at least one of: a control-sum-code, and a hash-value, andwherein verifying the error report by the administrator node includesreceiving the information file from the source node by the administratornode, computing a verification code based on content of the informationfile received by the administrator node, comparing the verification codewith the identifying code, and verifying the error report when theverification code does not correspond to the identifying code.
 3. Amethod of facilitating control of distribution of modified or corruptedfiles in a distributed network, the method comprising: providing acatalog of available files to nodes of the distributed network, thecatalog identifying each file of the available files and a correspondingsource node of each file, processing an error report from a target nodethat received a downloaded file from a selected source node, verifyingthe error report, degrading a trustworthy-measure of at least one nodeof the distributed network based on a result of verifying the errorreport, and providing the trustworthy-measure of the at least one nodeto other nodes of the distributed network; wherein verifying the errorreport is based upon an identifying code corresponding to an originalversion of the downloaded file, and verifying the error report includesreceiving the downloaded file from the selected source node by anadministrator node, computing a verification code based on content ofthe downloaded file received by the administrator node, comparing theverification code with the identifying code, and verifying the errorreport when the verification code does not correspond to the identifyingcode.
 4. The method of claim 3, wherein the catalog includes theidentifying code.
 5. A method of controlling a trustworthy-measureassociated with a source node in a distributed network, the methodcomprising: receiving, from a reporting node, a report of a modificationor corruption of an information file by the source node, determining avalidity of the report, and degrading the trustworthy-measure associatedwith the source node when the report is determined to be valid; whereindetermining the validity of the report includes notifying the sourcenode of the report, and assessing a response from the source node todetermine the validity of the report; and wherein receiving a report ofa modification or corruption of an information file by the source nodeincludes receiving a report that the modification or corruption was notcaused by a communication error, and assessing the response includes:determining that the report is valid if the response is a null-response,or an admittance of effecting the modification or corruption of theinformation, and revising the report to identify an alternative sourceof the modification or corruption of the information, if the responseincludes an acknowledgement of the modification or corruption.
 6. Themethod of claim 5, wherein assessing the response further includesassessing the reliability of at least one of: the information file, thesource node, and the reporting node.
 7. The method of claim 5, whereindetermining the validity of the report further includes determining areliability of the source node, and determining the reliability of thesource node is based on at least one of the trustworthy-measure of thesource node, longevity of the source node within the distributednetwork, traffic flow via the source node, and prior activities of thesource node.
 8. The method of claim 7, wherein determining the validityof the report also includes determining a reliability of the reportingnode, and determining the reliability of the reporting node is based onat least one of: the trustworthy-measure of the reporting node,longevity of the reporting node within the distributed network, trafficflow via the reporting node, and prior activities of the reporting node.9. The method of claim 5, wherein determining the validity of the reportfurther includes a verification of prior ownership of the informationfile.
 10. A communications network, comprising: a plurality of nodes,including at least a source node, a target node, and an administratornode, the source node having an information file and a correspondingidentifying code based on content of the information file at a priorpoint in time, the target node being configured to: receive theinformation file and identifying code, transmit a discrepancy reportbased on at least one of: a discrepancy between the identifying code anda computed code based on received content of the information file, andan abnormality in the information file, and the administrator node beingconfigured to: receive the discrepancy report, verify validity of thediscrepancy report, and modify a trustworthy-measure associated with atleast one node of the plurality of nodes in response to verifying thevalidity of the discrepancy report; wherein the administrator node isfurther configured to verify validity of the discrepancy report prior tomodifying the trustworthy-measure by verifying that the discrepancyreport is indicative of a modification or corruption of an informationfile by the source node that is not based upon a communication error.11. The communications network of claim 10, wherein the administratornode is further configured to verify validity of the discrepancy reportby: receiving the information file from the source node, and determininga verification code based on received content of the information file,and comparing the verification code to the identifying code.
 12. Thecommunications network of claim 10, wherein the administrator node isfurther configured to verify validity of the discrepancy report based onat least one of: a reliability of the received content of theinformation file, a record of prior ownership of the information file, areliability of the source node, a reliability of the reporting node, alongevity of the source node within the network, a longevity of thereporting node within the network, prior activities of the source nodewithin the network, and prior activities of the reporting node withinthe network.
 13. The communications network of claim 12, wherein thetrustworthy-measure of the source node is available for access by eachof the plurality of nodes, to facilitate control of subsequentdistribution of files from the source node based on thetrustworthy-measure.
 14. An administrator node in a distributedcommunications network for exchanging information files among aplurality of nodes, the administrator node configured to: receive adiscrepancy report from a reporting node, the discrepancy reportidentifying a source node and an information file, verify thediscrepancy report, and modify a trustworthy-measure associated at leastone node of the plurality of nodes, based on whether the discrepancyreport is valid: and wherein the discrepancy report is based on acomparison of a code computed by the reporting node to an identifyingcode corresponding to contents of the information file at a prior timeto determine that the discrepancy report identifies a discrepancy thatis not due to a communication error, the administrator node isconfigured to verify the discrepancy report by: receiving theinformation file from the source node, and determining a verificationcode based on received content of the information file, and comparingthe verification code to the identifying code.
 15. The administratornode of claim 14, wherein the administrator node is configured to verifythe discrepancy report based on at least one of a reliability of thereceived content of the information file, a record of prior ownership ofthe information file, a reliability of the source node, a reliability ofthe reporting node, a longevity of the source node within the network, alongevity of the reporting node within the network, prior activities ofthe source node within the network, and prior activities of thereporting node within the network.
 16. The administrator node of claim14, wherein the administrator node is further configured to provide acatalog that identifies a plurality of information files andcorresponding source nodes.
 17. The administrator node of claim 16,wherein the catalog further includes a parameter based on thetrustworthy-measure of the at least one node.
 18. The method of claim 1,wherein repeating the receiving, computing, and comparing steps prior totransmitting the error report is used to determine whether informationfile errors were caused during or prior to communication of theinformation file from the source node.
 19. The method of claim 18,further comprising preventing transmitting the error report upondetermining that the information file errors were caused duringcommunication.
 20. A method of facilitating control of distribution ofmodified or corrupted files in a distributed network, the methodcomprising: providing a catalog of available files to nodes of thedistributed network, the catalog identifying each file of the availablefiles and a corresponding source node of each file, processing an errorreport from a target node that received a downloaded file from aselected source node, verifying the error report, degrading atrustworthy-measure of at least one node of the distributed networkbased on a result of verifying the error report, and providing thetrustworthy-measure of the at least one node to other nodes of thedistributed network; wherein verifying the error report includesdetermining an originator node responsible for modifications to thedownloaded file giving rise to the error report, wherein determining theoriginator node includes notifying the selected source node, andassessing a response from the selected source node to determine thevalidity of the error report; and wherein assessing the responseincludes determining that the error report is valid if the response is anull-response or an admittance of causing the modifications to thedownloaded file, and revising the report to identify an alternativesource of the modifications to the downloaded file if the responseincludes an acknowledgement of the modifications.